Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude the MISRA Website from CI-CD link verifier checks #91

Merged
merged 6 commits into from
Nov 8, 2023

Conversation

Skptak
Copy link
Member

@Skptak Skptak commented Nov 8, 2023

No description provided.

@Skptak
Copy link
Member Author

Skptak commented Nov 8, 2023

This fixes the issue seen in FreeRTOS/FreeRTOS-Kernel#880
Putting the fix here so that all the repos that contain this link will receive this change instead of opening a per repo PR

@Skptak Skptak force-pushed the main branch 2 times, most recently from 32c8041 to 4a98c0d Compare November 8, 2023 20:27
…sequence. Also trying to fix the issue with trailing comas and slashes being counted as part of the URLs.
@@ -160,7 +160,7 @@ jobs:
org: AWS,
branch: main,
run-link-verifier: true,
run-complexity: true,
run-complexity: false,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jobs is undergoing changes still
For now skip running build related tests.

@@ -210,7 +210,7 @@ jobs:
with:
path: repo/${{ matrix.inputs.repository }}
exclude-dirs: complexity, formatting
exclude-urls: https://dummy-url.com/ota.bin
exclude-urls: https://dummy-url.com/ota.bin, https://s3.region.amazonaws.com/joe-ota
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jobs has a new URL for its own tests, add to this list.

@@ -20,6 +20,7 @@ inputs:
exclude-urls:
description: 'Comma separated list of URLS not to check'
required: false
default: https://www.misra.org.uk/misra-c, https://www.misra.org.uk
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Placing these here just so that if not adding them explicitly it shows up the in action log

# now has a CAPTCHA landing page, as such always exclude it from this check.
touch allowList.txt
echo "https://www.misra.org.uk/misra-c" >> allowList.txt
echo "https://www.misra.org.uk" >> allowList.txt
Copy link
Member Author

@Skptak Skptak Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea here is to make this fix more "portable" as any repos that are already using the above parameters would need an individual PR

Adding the exclusion of these URLs means that we only need to make this change in this repo, not ever repo

# Test that it will find this url and drop the slash
https://www.google.com/
# Test that it will find this url by dropping the coma
https://www.google.com,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test to make sure we don't try and use the trailing coma as part of the URL, currently an issue with the checker.

@@ -14,7 +14,7 @@
import traceback
from collections import defaultdict

MARKDOWN_SEARCH_TERM = r'\.md$'
MARKDOWN_SEARCH_TERM = r"\.md$"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still throws a warning about it not being a valid escape sequence? But not sure what it wants.

@@ -14,7 +14,7 @@
import traceback
from collections import defaultdict

MARKDOWN_SEARCH_TERM = r'\.md$'
MARKDOWN_SEARCH_TERM = r"\.md$"
# Regex to find a URL
URL_SEARCH_TERM = r'(\b(https?)://[^\s\)\]\\"<>]+[^\s\)\.\]\\"<>])'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make sense to update this regex to exclude the trailing slash, or coma, but I honestly have no idea how this even matches currently.

@@ -151,7 +151,11 @@ def identify_broken_links(self, files, verbose):
cprint(f'\t{link}','green')

for link in self.external_links:
is_broken, status_code = test_url(link)
# Remove the trailing slash or trailing coma
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing slash - So that we don't do a duplicate search of <URL>/ and <URL>

Trailing coma - There are a few places I've seen links be put into files like this
/* <COMMENT> <URL>, <MORE COMMENT> */
Where this then breaks the URL checker, since the current regex grabs the coma

@@ -166,7 +170,7 @@ def parse_file(html_file):
return HtmlFile(html_file)

def html_name_from_markdown(filename):
md_pattern = re.compile('\.md', re.IGNORECASE)
md_pattern = re.compile("\.md$", re.IGNORECASE)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this uses this, compared to the global one at the top of the file, but just tried using quotes to see if that helped with the warning

Added the $ to make sure it only looks for files that fully end in .md

@@ -254,7 +258,10 @@ def fetch_issues(repo, issue_type, limit):
if process.returncode == 0:
key = issue_type + 's'
for issue in process.stdout.split():
main_repo_list[repo][key].add(int(issue))
if(issue.isnumeric()):
Copy link
Member Author

@Skptak Skptak Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If for some reason the GitHub issues can't be accessed this will throw an error attempting to convert it to an int

Check if we have an actual number first, if we do not that means there was an error reading the actual Issues from the repo. When this occurs it returns an output of:

  Stdout = ['gh:', 'To', 'use', 'GitHub', 'CLI', 'in', 'a', 'GitHub', 'Actions', 'workflow,', 'set', 'the', 'GH_TOKEN', 'environment', 'variable.', 'Example:', 'env:', 'GH_TOKEN:', '${{', 'github.token', '}}']

@@ -347,7 +353,7 @@ def main():
if any(file.endswith(file_type) for file_type in args.include_files):
f_path = os.path.join(root, file)
if args.verbose:
print("Processing File: {}".format(f_path))
print("\nProcessing File: {}".format(f_path))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a newline just to help space the log out when doing a verbose run.

if ( ( link[-1] == "/" ) or ( link[-1] == "," ) ):
is_broken, status_code = test_url(link[:-1])
else:
is_broken, status_code = test_url(link)
if is_broken:
broken_links.append(link)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentionally use the version of the URL with the coma or slash for the error list. This way the exact link can be searched for in the source file easily.

@Skptak Skptak merged commit b2be421 into FreeRTOS:main Nov 8, 2023
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants